Search Results: "manu"

11 March 2024

Joachim Breitner: Convenient sandboxed development environment

I like using one machine and setup for everything, from serious development work to hobby projects to managing my finances. This is very convenient, as often the lines between these are blurred. But it is also scary if I think of the large number of people who I have to trust to not want to extract all my personal data. Whenever I run a cabal install, or a fun VSCode extension gets updated, or anything like that, I am running code that could be malicious or buggy. In a way it is surprising and reassuring that, as far as I can tell, this commonly does not happen. Most open source developers out there seem to be nice and well-meaning, after all.

Convenient or it won t happen Nevertheless I thought I should do something about this. The safest option would probably to use dedicated virtual machines for the development work, with very little interaction with my main system. But knowing me, that did not seem likely to happen, as it sounded like a fair amount of hassle. So I aimed for a viable compromise between security and convenient, and one that does not get too much in the way of my current habits. For instance, it seems desirable to have the project files accessible from my unconstrained environment. This way, I could perform certain actions that need access to secret keys or tokens, but are (unlikely) to run code (e.g. git push, git pull from private repositories, gh pr create) from the outside , and the actual build environment can do without access to these secrets. The user experience I thus want is a quick way to enter a development environment where I can do most of the things I need to do while programming (network access, running command line and GUI programs), with access to the current project, but without access to my actual /home directory. I initially followed the blog post Application Isolation using NixOS Containers by Marcin Sucharski and got something working that mostly did what I wanted, but then a colleague pointed out that tools like firejail can achieve roughly the same with a less global setup. I tried to use firejail, but found it to be a bit too inflexible for my particular whims, so I ended up writing a small wrapper around the lower level sandboxing tool https://github.com/containers/bubblewrap.

Selective bubblewrapping This script, called dev and included below, builds a new filesystem namespace with minimal /proc and /dev directories, it s own /tmp directories. It then binds-mound some directories to make the host s NixOS system available inside the container (/bin, /usr, the nix store including domain socket, stuff for OpenGL applications). My user s home directory is taken from ~/.dev-home and some configuration files are bind-mounted for convenient sharing. I intentionally don t share most of the configuration for example, a direnv enable in the dev environment should not affect the main environment. The X11 socket for graphical applications and the corresponding .Xauthority file is made available. And finally, if I run dev in a project directory, this project directory is bind mounted writable, and the current working directory is preserved. The effect is that I can type dev on the command line to enter dev mode rather conveniently. I can run development tools, including graphical ones like VSCode, and especially the latter with its extensions is part of the sandbox. To do a git push I either exit the development environment (Ctrl-D) or open a separate terminal. Overall, the inconvenience of switching back and forth seems worth the extra protection. Clearly, isn t going to hold against a determined and maybe targeted attacker (e.g. access to the X11 and the nix daemon socket can probably be used to escape easily). But I hope it will help against a compromised dev dependency that just deletes or exfiltrates data, like keys or passwords, from the usual places in $HOME.

Rough corners There is more polishing that could be done.
  • In particular, clicking on a link inside VSCode in the container will currently open Firefox inside the container, without access to my settings and cookies etc. Ideally, links would be opened in the Firefox running outside. This is a problem that has a solution in the world of applications that are sandboxed with Flatpak, and involves a bunch of moving parts (a xdg-desktop-portal user service, a filtering dbus proxy, exposing access to that proxy in the container). I experimented with that for a bit longer than I should have, but could not get it to work to satisfaction (even without a container involved, I could not get xdg-desktop-portal to heed my default browser settings ). For now I will live with manually copying and pasting URLs, we ll see how long this lasts.
  • With this setup (and unlike the NixOS container setup I tried first), the same applications are installed inside and outside. It might be useful to separate the set of installed programs: There is simply no point in running evolution or firefox inside the container, and if I do not even have VSCode or cabal available outside, so that it s less likely that I forget to enter dev before using these tools. It shouldn t be too hard to cargo-cult some of the NixOS Containers infrastructure to be able to have a separate system configuration that I can manage as part of my normal system configuration and make available to bubblewrap here.
So likely I will refine this some more over time. Or get tired of typing dev and going back to what I did before

The script
The dev script (at the time of writing)

9 March 2024

Reproducible Builds: Reproducible Builds in February 2024

Welcome to the February 2024 report from the Reproducible Builds project! In our reports, we try to outline what we have been up to over the past month as well as mentioning some of the important things happening in software supply-chain security.

Reproducible Builds at FOSDEM 2024 Core Reproducible Builds developer Holger Levsen presented at the main track at FOSDEM on Saturday 3rd February this year in Brussels, Belgium. However, that wasn t the only talk related to Reproducible Builds. However, please see our comprehensive FOSDEM 2024 news post for the full details and links.

Maintainer Perspectives on Open Source Software Security Bernhard M. Wiedemann spotted that a recent report entitled Maintainer Perspectives on Open Source Software Security written by Stephen Hendrick and Ashwin Ramaswami of the Linux Foundation sports an infographic which mentions that 56% of [polled] projects support reproducible builds .

Mailing list highlights From our mailing list this month:

Distribution work In Debian this month, 5 reviews of Debian packages were added, 22 were updated and 8 were removed this month adding to Debian s knowledge about identified issues. A number of issue types were updated as well. [ ][ ][ ][ ] In addition, Roland Clobus posted his 23rd update of the status of reproducible ISO images on our mailing list. In particular, Roland helpfully summarised that all major desktops build reproducibly with bullseye, bookworm, trixie and sid provided they are built for a second time within the same DAK run (i.e. [within] 6 hours) and that there will likely be further work at a MiniDebCamp in Hamburg. Furthermore, Roland also responded in-depth to a query about a previous report
Fedora developer Zbigniew J drzejewski-Szmek announced a work-in-progress script called fedora-repro-build that attempts to reproduce an existing package within a koji build environment. Although the projects README file lists a number of fields will always or almost always vary and there is a non-zero list of other known issues, this is an excellent first step towards full Fedora reproducibility.
Jelle van der Waa introduced a new linter rule for Arch Linux packages in order to detect cache files leftover by the Sphinx documentation generator which are unreproducible by nature and should not be packaged. At the time of writing, 7 packages in the Arch repository are affected by this.
Elsewhere, Bernhard M. Wiedemann posted another monthly update for his work elsewhere in openSUSE.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 256, 257 and 258 to Debian and made the following additional changes:
  • Use a deterministic name instead of trusting gpg s use-embedded-filenames. Many thanks to Daniel Kahn Gillmor dkg@debian.org for reporting this issue and providing feedback. [ ][ ]
  • Don t error-out with a traceback if we encounter struct.unpack-related errors when parsing Python .pyc files. (#1064973). [ ]
  • Don t try and compare rdb_expected_diff on non-GNU systems as %p formatting can vary, especially with respect to MacOS. [ ]
  • Fix compatibility with pytest 8.0. [ ]
  • Temporarily fix support for Python 3.11.8. [ ]
  • Use the 7zip package (over p7zip-full) after a Debian package transition. (#1063559). [ ]
  • Bump the minimum Black source code reformatter requirement to 24.1.1+. [ ]
  • Expand an older changelog entry with a CVE reference. [ ]
  • Make test_zip black clean. [ ]
In addition, James Addison contributed a patch to parse the headers from the diff(1) correctly [ ][ ] thanks! And lastly, Vagrant Cascadian pushed updates in GNU Guix for diffoscope to version 255, 256, and 258, and updated trydiffoscope to 67.0.6.

reprotest reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Vagrant Cascadian made a number of changes, including:
  • Create a (working) proof of concept for enabling a specific number of CPUs. [ ][ ]
  • Consistently use 398 days for time variation rather than choosing randomly and update README.rst to match. [ ][ ]
  • Support a new --vary=build_path.path option. [ ][ ][ ][ ]

Website updates There were made a number of improvements to our website this month, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In February, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Temporarily disable upgrading/bootstrapping Debian unstable and experimental as they are currently broken. [ ][ ]
    • Use the 64-bit amd64 kernel on all i386 nodes; no more 686 PAE kernels. [ ]
    • Add an Erlang package set. [ ]
  • Other changes:
    • Grant Jan-Benedict Glaw shell access to the Jenkins node. [ ]
    • Enable debugging for NetBSD reproducibility testing. [ ]
    • Use /usr/bin/du --apparent-size in the Jenkins shell monitor. [ ]
    • Revert reproducible nodes: mark osuosl2 as down . [ ]
    • Thanks again to Codethink, for they have doubled the RAM on our arm64 nodes. [ ]
    • Only set /proc/$pid/oom_score_adj to -1000 if it has not already been done. [ ]
    • Add the opemwrt-target-tegra and jtx task to the list of zombie jobs. [ ][ ]
Vagrant Cascadian also made the following changes:
  • Overhaul the handling of OpenSSH configuration files after updating from Debian bookworm. [ ][ ][ ]
  • Add two new armhf architecture build nodes, virt32z and virt64z, and insert them into the Munin monitoring. [ ][ ] [ ][ ]
In addition, Alexander Couzens updated the OpenWrt configuration in order to replace the tegra target with mpc85xx [ ], Jan-Benedict Glaw updated the NetBSD build script to use a separate $TMPDIR to mitigate out of space issues on a tmpfs-backed /tmp [ ] and Zheng Junjie added a link to the GNU Guix tests [ ]. Lastly, node maintenance was performed by Holger Levsen [ ][ ][ ][ ][ ][ ] and Vagrant Cascadian [ ][ ][ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

4 March 2024

Colin Watson: Free software activity in January/February 2024

Two months into my new gig and it s going great! Tracking my time has taken a bit of getting used to, but having something that amounts to a queryable database of everything I ve done has also allowed some helpful introspection. Freexian sponsors up to 20% of my time on Debian tasks of my choice. In fact I ve been spending the bulk of my time on debusine which is itself intended to accelerate work on Debian, but more details on that later. While I contribute to Freexian s summaries now, I ve also decided to start writing monthly posts about my free software activity as many others do, to get into some more detail. January 2024 February 2024

3 March 2024

Iustin Pop: New corydalis 2024.9.0 release!

Obligatory and misused quote: It s not dead, Jim! I ve kind of dropped by ball lately on organising my own photo collection, but February was a pretty good month and I managed to write some more code for Corydalis, ending up with the aforementioned new release. The release is not a big one, but I did manage to solve one thing that was annoying me greatly: that lack of ability to play videos inline in one of the two picture viewing modes (in my preferred mode, in fact). Now, whether you re browsing through pictures, or looking at pictures one-by-one, you can in both cases play videos easily, and to some extent, as it should be . No user docs for that, yet (I actually need to split the manual in user/admin/developer parts) I did some more internal cleanups, and I ve enabled building release zips (since that s how GitHub actions creates artifacts), which means it should be 10% easier to test this. The rest 90% is configuring it and pointing to picture folders and and and, so this is definitely not plug-and-play. The diff summary between 2023.44.0 and 2024.9.0 is: 56 files changed, 1412 insertions(+), 700 deletions(-). Which is not bad, but also not too much. The biggest churn was, as expected, in the viewer (due to the aforementioned video playing). The scary part is that the TypeScript code is not at 7.9% (and a tiny more JS, which I can t convert yet due to lack of type definitions upstream). I say scary in quotes, because I would actually like to know Typescript better, but no time. The new release can be seen in action on demo.corydalis.io, and as always, just after release I found two minor issues: Well, there will be future releases. For now, I ve made an open-source package release, which I didn t do in a while, so I m happy . See you!

Petter Reinholdtsen: RAID status from LSI Megaraid controllers using free software

The last few days I have revisited RAID setup using the LSI Megaraid controller. These are a family of controllers called PERC by Dell, and is present in several old PowerEdge servers, and I recently got my hands on one of these. I had forgotten how to handle this RAID controller in Debian, so I had to take a peek in the Debian wiki page "Linux and Hardware RAID: an administrator's summary" to remember what kind of software is available to configure and monitor the disks and controller. I prefer Free Software alternatives to proprietary tools, as the later tend to fall into disarray once the manufacturer loose interest, and often do not work with newer Linux Distributions. Sadly there is no free software tool to configure the RAID setup, only to monitor it. RAID can provide improved reliability and resilience in a storage solution, but only if it is being regularly checked and any broken disks are being replaced in time. I thus want to ensure some automatic monitoring is available. In the discovery process, I came across a old free software tool to monitor PERC2, PERC3, PERC4 and PERC5 controllers, which to my surprise is not present in debian. To help change that I created a request for packaging of the megactl package, and tried to track down a usable version. The original project site is on Sourceforge, but as far as I can tell that project has been dead for more than 15 years. I managed to find a more recent fork on github from user hmage, but it is unclear to me if this is still being maintained. It has not seen much improvements since 2016. A more up to date edition is a git fork from the original github fork by user namiltd, and this newer fork seem a lot more promising. The owner of this github repository has replied to change proposals within hours, and had already added some improvements and support for more hardware. Sadly he is reluctant to commit to maintaining the tool and stated in my first pull request that he think a new release should be made based on the git repository owned by hmage. I perfectly understand this reluctance, as I feel the same about maintaining yet another package in Debian when I barely have time to take care of the ones I already maintain, but do not really have high hopes that hmage will have time to spend on it and hope namiltd will change his mind. In any case, I created a draft package based on the namiltd edition and put it under the debian group on salsa.debian.org. If you own a Dell PowerEdge server with one of the PERC controllers, or any other RAID controller using the megaraid or megaraid_sas Linux kernel modules, you might want to check it out. If enough people are interested, perhaps the package will make it into the Debian archive. There are two tools provided, megactl for the megaraid Linux kernel module, and megasasctl for the megaraid_sas Linux kernel module. The simple output from the command on one of my machines look like this (yes, I know some of the disks have problems. :).
# megasasctl 
a0       PERC H730 Mini           encl:1 ldrv:2  batt:good
a0d0       558GiB RAID 1   1x2  optimal
a0d1      3067GiB RAID 0   1x11 optimal
a0e32s0     558GiB  a0d0  online   errs: media:0  other:19
a0e32s1     279GiB  a0d1  online  
a0e32s2     279GiB  a0d1  online  
a0e32s3     279GiB  a0d1  online  
a0e32s4     279GiB  a0d1  online  
a0e32s5     279GiB  a0d1  online  
a0e32s6     279GiB  a0d1  online  
a0e32s8     558GiB  a0d0  online   errs: media:0  other:17
a0e32s9     279GiB  a0d1  online  
a0e32s10    279GiB  a0d1  online  
a0e32s11    279GiB  a0d1  online  
a0e32s12    279GiB  a0d1  online  
a0e32s13    279GiB  a0d1  online  
#
In addition to displaying a simple status report, it can also test individual drives and print the various event logs. Perhaps you too find it useful? In the packaging process I provided some patches upstream to improve installation and ensure a Appstream metainfo file is provided to list all supported HW, to allow isenkram to propose the package on all servers with a relevant PCI card. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

25 February 2024

Russ Allbery: Review: The Fund

Review: The Fund, by Rob Copeland
Publisher: St. Martin's Press
Copyright: 2023
ISBN: 1-250-27694-2
Format: Kindle
Pages: 310
I first became aware of Ray Dalio when either he or his publisher plastered advertisements for The Principles all over the San Francisco 4th and King Caltrain station. If I recall correctly, there were also constant radio commercials; it was a whole thing in 2017. My brain is very good at tuning out advertisements, so my only thought at the time was "some business guy wrote a self-help book." I think I vaguely assumed he was a CEO of some traditional business, since that's usually who writes heavily marketed books like this. I did not connect him with hedge funds or Bridgewater, which I have a bad habit of confusing with Blackwater. The Principles turns out to be more of a laundered cult manual than a self-help book. And therein lies a story. Rob Copeland is currently with The New York Times, but for many years he was the hedge fund reporter for The Wall Street Journal. He covered, among other things, Bridgewater Associates, the enormous hedge fund founded by Ray Dalio. The Fund is a biography of Ray Dalio and a history of Bridgewater from its founding as a vehicle for Dalio's advising business until 2022 when Dalio, after multiple false starts and title shuffles, finally retired from running the company. (Maybe. Based on the history recounted here, it wouldn't surprise me if he was back at the helm by the time you read this.) It is one of the wildest, creepiest, and most abusive business histories that I have ever read. It's probably worth mentioning, as Copeland does explicitly, that Ray Dalio and Bridgewater hate this book and claim it's a pack of lies. Copeland includes some of their denials (and many non-denials that sound as good as confirmations to me) in footnotes that I found increasingly amusing.
A lawyer for Dalio said he "treated all employees equally, giving people at all levels the same respect and extending them the same perks."
Uh-huh. Anyway, I personally know nothing about Bridgewater other than what I learned here and the occasional mention in Matt Levine's newsletter (which is where I got the recommendation for this book). I have no independent information whether anything Copeland describes here is true, but Copeland provides the typical extensive list of notes and sourcing one expects in a book like this, and Levine's comments indicated it's generally consistent with Bridgewater's industry reputation. I think this book is true, but since the clear implication is that the world's largest hedge fund was primarily a deranged cult whose employees mostly spied on and rated each other rather than doing any real investment work, I also have questions, not all of which Copeland answers to my satisfaction. But more on that later. The center of this book are the Principles. These were an ever-changing list of rules and maxims for how people should conduct themselves within Bridgewater. Per Copeland, although Dalio later published a book by that name, the version of the Principles that made it into the book was sanitized and significantly edited down from the version used inside the company. Dalio was constantly adding new ones and sometimes changing them, but the common theme was radical, confrontational "honesty": never being silent about problems, confronting people directly about anything that they did wrong, and telling people all of their faults so that they could "know themselves better." If this sounds like textbook abusive behavior, you have the right idea. This part Dalio admits to openly, describing Bridgewater as a firm that isn't for everyone but that achieves great results because of this culture. But the uncomfortably confrontational vibes are only the tip of the iceberg of dysfunction. Here are just a few of the ways this played out according to Copeland: In one of the common and all-too-disturbing connections between Wall Street finance and the United States' dysfunctional government, James Comey (yes, that James Comey) ran internal security for Bridgewater for three years, meaning that he was the one who pulled evidence from surveillance cameras for Dalio to use to confront employees during his trials. In case the cult vibes weren't strong enough already, Bridgewater developed its own idiosyncratic language worthy of Scientology. The trials were called "probings," firing someone was called "sorting" them, and rating them was called "dotting," among many other Bridgewater-specific terms. Needless to say, no one ever probed Dalio himself. You will also be completely unsurprised to learn that Copeland documents instances of sexual harassment and discrimination at Bridgewater, including some by Dalio himself, although that seems to be a relatively small part of the overall dysfunction. Dalio was happy to publicly humiliate anyone regardless of gender. If you're like me, at this point you're probably wondering how Bridgewater continued operating for so long in this environment. (Per Copeland, since Dalio's retirement in 2022, Bridgewater has drastically reduced the cult-like behaviors, deleted its archive of probings, and de-emphasized the Principles.) It was not actually a religious cult; it was a hedge fund that has to provide investment services to huge, sophisticated clients, and by all accounts it's a very successful one. Why did this bizarre nightmare of a workplace not interfere with Bridgewater's business? This, I think, is the weakest part of this book. Copeland makes a few gestures at answering this question, but none of them are very satisfying. First, it's clear from Copeland's account that almost none of the employees of Bridgewater had any control over Bridgewater's investments. Nearly everyone was working on other parts of the business (sales, investor relations) or on cult-related obsessions. Investment decisions (largely incorporated into algorithms) were made by a tiny core of people and often by Dalio himself. Bridgewater also appears to not trade frequently, unlike some other hedge funds, meaning that they probably stay clear of the more labor-intensive high-frequency parts of the business. Second, Bridgewater took off as a hedge fund just before the hedge fund boom in the 1990s. It transformed from Dalio's personal consulting business and investment newsletter to a hedge fund in 1990 (with an earlier investment from the World Bank in 1987), and the 1990s were a very good decade for hedge funds. Bridgewater, in part due to Dalio's connections and effective marketing via his newsletter, became one of the largest hedge funds in the world, which gave it a sort of institutional momentum. No one was questioned for putting money into Bridgewater even in years when it did poorly compared to its rivals. Third, Dalio used the tried and true method of getting free publicity from the financial press: constantly predict an upcoming downturn, and aggressively take credit whenever you were right. From nearly the start of his career, Dalio predicted economic downturns year after year. Bridgewater did very well in the 2000 to 2003 downturn, and again during the 2008 financial crisis. Dalio aggressively takes credit for predicting both of those downturns and positioning Bridgewater correctly going into them. This is correct; what he avoids mentioning is that he also predicted downturns in every other year, the majority of which never happened. These points together create a bit of an answer, but they don't feel like the whole picture and Copeland doesn't connect the pieces. It seems possible that Dalio may simply be good at investing; he reads obsessively and clearly enjoys thinking about markets, and being an abusive cult leader doesn't take up all of his time. It's also true that to some extent hedge funds are semi-free money machines, in that once you have a sufficient quantity of money and political connections you gain access to investment opportunities and mechanisms that are very likely to make money and that the typical investor simply cannot access. Dalio is clearly good at making personal connections, and invested a lot of effort into forming close ties with tricky clients such as pools of Chinese money. Perhaps the most compelling explanation isn't mentioned directly in this book but instead comes from Matt Levine. Bridgewater touts its algorithmic trading over humans making individual trades, and there is some reason to believe that consistently applying an algorithm without regard to human emotion is a solid trading strategy in at least some investment areas. Levine has asked in his newsletter, tongue firmly in cheek, whether the bizarre cult-like behavior and constant infighting is a strategy to distract all the humans and keep them from messing with the algorithm and thus making bad decisions. Copeland leaves this question unsettled. Instead, one comes away from this book with a clear vision of the most dysfunctional workplace I have ever heard of, and an endless litany of bizarre events each more astonishing than the last. If you like watching train wrecks, this is the book for you. The only drawback is that, unlike other entries in this genre such as Bad Blood or Billion Dollar Loser, Bridgewater is a wildly successful company, so you don't get the schadenfreude of seeing a house of cards collapse. You do, however, get a helpful mental model to apply to the next person who tries to talk to you about "radical honesty" and "idea meritocracy." The flaw in this book is that the existence of an organization like Bridgewater is pointing to systematic flaws in how our society works, which Copeland is largely uninterested in interrogating. "How could this have happened?" is a rather large question to leave unanswered. The sheer outrageousness of Dalio's behavior also gets a bit tiring by the end of the book, when you've seen the patterns and are hearing about the fourth variation. But this is still an astonishing book, and a worthy entry in the genre of capitalism disasters. Rating: 7 out of 10

24 February 2024

Niels Thykier: Language Server for Debian: Spellchecking

This is my third update on writing a language server for Debian packaging files, which aims at providing a better developer experience for Debian packagers. Lets go over what have done since the last report.
Semantic token support I have added support for what the Language Server Protocol (LSP) call semantic tokens. These are used to provide the editor insights into tokens of interest for users. Allegedly, this is what editors would use for syntax highlighting as well. Unfortunately, eglot (emacs) does not support semantic tokens, so I was not able to test this. There is a 3-year old PR for supporting with the last update being ~3 month basically saying "Please sign the Copyright Assignment". I pinged the GitHub issue in the hopes it will get unstuck. For good measure, I also checked if I could try it via neovim. Before installing, I read the neovim docs, which helpfully listed the features supported. Sadly, I did not spot semantic tokens among those and parked from there. That was a bit of a bummer, but I left the feature in for now. If you have an LSP capable editor that supports semantic tokens, let me know how it works for you! :)
Spellchecking Finally, I implemented something Otto was missing! :) This stared with Paul Wise reminding me that there were Python binding for the hunspell spellchecker. This enabled me to get started with a quick prototype that spellchecked the Description fields in debian/control. I also added spellchecking of comments while I was add it. The spellchecker runs with the standard en_US dictionary from hunspell-en-us, which does not have a lot of technical terms in it. Much less any of the Debian specific slang. I spend considerable time providing a "built-in" wordlist for technical and Debian specific slang to overcome this. I also made a "wordlist" for known Debian people that the spellchecker did not recognise. Said wordlist is fairly short as a proof of concept, and I fully expect it to be community maintained if the language server becomes a success. My second problem was performance. As I had suspected that spellchecking was not the fastest thing in the world. Therefore, I added a very small language server for the debian/changelog, which only supports spellchecking the textual part. Even for a small changelog of a 1000 lines, the spellchecking takes about 5 seconds, which confirmed my suspicion. With every change you do, the existing diagnostics hangs around for 5 seconds before being updated. Notably, in emacs, it seems that diagnostics gets translated into an absolute character offset, so all diagnostics after the change gets misplaced for every character you type. Now, there is little I could do to speed up hunspell. But I can, as always, cheat. The way diagnostics work in the LSP is that the server listens to a set of notifications like "document opened" or "document changed". In a response to that, the LSP can start its diagnostics scanning of the document and eventually publish all the diagnostics to the editor. The spec is quite clear that the server owns the diagnostics and the diagnostics are sent as a "notification" (that is, fire-and-forgot). Accordingly, there is nothing that prevents the server from publishing diagnostics multiple times for a single trigger. The only requirement is that the server publishes the accumulated diagnostics in every publish (that is, no delta updating). Leveraging this, I had the language server for debian/changelog scan the document and publish once for approximately every 25 typos (diagnostics) spotted. This means you quickly get your first result and that clears the obsolete diagnostics. Thereafter, you get frequent updates to the remainder of the document if you do not perform any further changes. That is, up to a predefined max of typos, so we do not overload the client for longer changelogs. If you do any changes, it resets and starts over. The only bit missing was dealing with concurrency. By default, a pygls language server is single threaded. It is not great if the language server hangs for 5 seconds everytime you type anything. Fortunately, pygls has builtin support for asyncio and threaded handlers. For now, I did an async handler that await after each line and setup some manual detection to stop an obsolete diagnostics run. This means the server will fairly quickly abandon an obsolete run. Also, as a side-effect of working on the spellchecking, I fixed multiple typos in the changelog of debputy. :)
Follow up on the "What next?" from my previous update In my previous update, I mentioned I had to finish up my python-debian changes to support getting the location of a token in a deb822 file. That was done, the MR is now filed, and is pending review. Hopefully, it will be merged and uploaded soon. :) I also submitted my proposal for a different way of handling relationship substvars to debian-devel. So far, it seems to have received only positive feedback. I hope it stays that way and we will have this feature soon. Guillem proposed to move some of this into dpkg, which might delay my plans a bit. However, it might be for the better in the long run, so I will wait a bit to see what happens on that front. :) As noted above, I managed to add debian/changelog as a support format for the language server. Even if it only does spellchecking and trimming of trailing newlines on save, it technically is a new format and therefore cross that item off my list. :D Unfortunately, I did not manage to write a linter variant that does not involve using an LSP-capable editor. So that is still pending. Instead, I submitted an MR against elpa-dpkg-dev-el to have it recognize all the fields that the debian/control LSP knows about at this time to offset the lack of semantic token support in eglot.
From here... My sprinting on this topic will soon come to an end, so I have to a bit more careful now with what tasks I open! I think I will narrow my focus to providing a batch linting interface. Ideally, with an auto-fix for some of the more mechanical issues, where this is little doubt about the answer. Additionally, I think the spellchecking will need a bit more maturing. My current code still trips on naming patterns that are "clearly" verbatim or code references like things written in CamelCase or SCREAMING_SNAKE_CASE. That gets annoying really quickly. It also trips on a lot of commands like dpkg-gencontrol, but that is harder to fix since it could have been a real word. I think those will have to be fixed people using quotes around the commands. Maybe the most popular ones will end up in the wordlist. Beyond that, I will play it by ear if I have any time left. :)

20 February 2024

Niels Thykier: Language Server (LSP) support for debian/control

About a month ago, Otto Kek l inen asked for editor extensions for debian related files on the debian-devel mailing list. In that thread, I concluded that what we were missing was a "Language Server" (LSP) for our packaging files. Last week, I started a prototype for such a LSP for the debian/control file as a starting point based on the pygls library. The initial prototype worked and I could do very basic diagnostics plus completion suggestion for field names.
Current features I got 4 basic features implemented, though I have only been able to test two of them in emacs.
  • Diagnostics or linting of basic issues.
  • Completion suggestions for all known field names that I could think of and values for some fields.
  • Folding ranges (untested). This feature enables the editor to "fold" multiple lines. It is often used with multi-line comments and that is the feature currently supported.
  • On save, trim trailing whitespace at the end of lines (untested). Might not be registered correctly on the server end.
Despite its very limited feature set, I feel editing debian/control in emacs is now a much more pleasant experience. Coming back to the features that Otto requested, the above covers a grand total of zero. Sorry, Otto. It is not you, it is me.
Completion suggestions For completion, all known fields are completed. Place the cursor at the start of the line or in a partially written out field name and trigger the completion in your editor. In my case, I can type R-R-R and trigger the completion and the editor will automatically replace it with Rules-Requires-Root as the only applicable match. Your milage may vary since I delegate most of the filtering to the editor, meaning the editor has the final say about whether your input matches anything. The only filtering done on the server side is that the server prunes out fields already used in the paragraph, so you are not presented with the option to repeat an already used field, which would be an error. Admittedly, not an error the language server detects at the moment, but other tools will. When completing field, if the field only has one non-default value such as Essential which can be either no (the default, but you should not use it) or yes, then the completion suggestion will complete the field along with its value. This is mostly only applicable for "yes/no" fields such as Essential and Protected. But it does also trigger for Package-Type at the moment. As for completing values, here the language server can complete the value for simple fields such as "yes/no" fields, Multi-Arch, Package-Type and Priority. I intend to add support for Section as well - maybe also Architecture.
Diagnostics On the diagnostic front, I have added multiple diagnostics:
  • An error marker for syntax errors.
  • An error marker for missing a mandatory field like Package or Architecture. This also includes Standards-Version, which is admittedly mandatory by policy rather than tooling falling part.
  • An error marker for adding Multi-Arch: same to an Architecture: all package.
  • Error marker for providing an unknown value to a field with a set of known values. As an example, writing foo in Multi-Arch would trigger this one.
  • Warning marker for using deprecated fields such as DM-Upload-Allowed, or when setting a field to its default value for fields like Essential. The latter rule only applies to selected fields and notably Multi-Arch: no does not trigger a warning.
  • Info level marker if a field like Priority duplicates the value of the Source paragraph.
Notable omission at this time:
  • No errors are raised if a field does not have a value.
  • No errors are raised if a field is duplicated inside a paragraph.
  • No errors are used if a field is used in the wrong paragraph.
  • No spellchecking of the Description field.
  • No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.
  • Quick fixes to solve these problems... :)
Trying it out If you want to try, it is sadly a bit more involved due to things not being uploaded or merged yet. Also, be advised that I will regularly rebase my git branches as I revise the code. The setup:
  • Build and install the deb of the main branch of pygls from https://salsa.debian.org/debian/pygls The package is in NEW and hopefully this step will soon just be a regular apt install.
  • Build and install the deb of the rts-locatable branch of my python-debian fork from https://salsa.debian.org/nthykier/python-debian There is a draft MR of it as well on the main repo.
  • Build and install the deb of the lsp-support branch of debputy from https://salsa.debian.org/debian/debputy
  • Configure your editor to run debputy lsp debian/control as the language server for debian/control. This is depends on your editor. I figured out how to do it for emacs (see below). I also found a guide for neovim at https://neovim.io/doc/user/lsp. Note that debputy can be run from any directory here. The debian/control is a reference to the file format and not a concrete file in this case.
Obviously, the setup should get easier over time. The first three bullet points should eventually get resolved by merges and upload meaning you end up with an apt install command instead of them. For the editor part, I would obviously love it if we can add snippets for editors to make the automatically pick up the language server when the relevant file is installed.
Using the debputy LSP in emacs The guide I found so far relies on eglot. The guide below assumes you have the elpa-dpkg-dev-el package installed for the debian-control-mode. Though it should be a trivially matter to replace debian-control-mode with a different mode if you use a different mode for your debian/control file. In your emacs init file (such as ~/.emacs or ~/.emacs.d/init.el), you add the follow blob.
(with-eval-after-load 'eglot
    (add-to-list 'eglot-server-programs
        '(debian-control-mode . ("debputy" "lsp" "debian/control"))))
Once you open the debian/control file in emacs, you can type M-x eglot to activate the language server. Not sure why that manual step is needed and if someone knows how to automate it such that eglot activates automatically on opening debian/control, please let me know. For testing completions, I often have to manually activate them (with C-M-i or M-x complete-symbol). Though, it is a bit unclear to me whether this is an emacs setting that I have not toggled or something I need to do on the language server side.
From here As next steps, I will probably look into fixing some of the "known missing" items under diagnostics. The quick fix would be a considerable improvement to assisting users. In the not so distant future, I will probably start to look at supporting other files such as debian/changelog or look into supporting configuration, so I can cover formatting features like wrap-and-sort. I am also very much open to how we can provide integrations for this feature into editors by default. I will probably create a separate binary package for specifically this feature that pulls all relevant dependencies that would be able to provide editor integrations as well.

18 February 2024

Russell Coker: Release Years

In 2008 I wrote about the idea of having a scheduled release for Debian and other distributions as Mark Shuttleworth had proposed [1]. I still believe that Mark s original idea for synchronised release dates of Linux distributions (or at least synchronised feature sets) is a good one but unfortunately it didn t take off. Having been using Ubuntu a bit recently I ve found the version numbering system to be really good. Ubuntu version 16.04 was release in April 2016, it s support ended 5 years later in about April 2021, so any commonly available computers from 2020 should run it well and versions of applications released in about 2017 should run on it. If I have to support a Debian 10 (Buster) system I need to start with a web search to discover when it was released (July 2019). That suggests that applications packaged for Ubuntu 18.04 are likely to run on it. If we had the same numbering system for Debian and Ubuntu then it would be easier to compare versions. Debian 19.06 would be obviously similar to Ubuntu 18.04, or we could plan for the future and call it Debian 2019. Then it would be ideal if hardware vendors did the same thing (as car manufacturers have been doing for a long time). Which versions of Ubuntu and Debian would run well on a Dell PowerEdge R750? It takes a little searching to discover that the R750 was released in 2021, but if they called it a PowerEdge 2021R70 then it would be quite obvious that Ubuntu 2022.04 would run well on it and that Ubuntu 2020.04 probably has a kernel update with all the hardware supported. One of the benefits for the car industry in naming model years is that it drives the purchase of a new car. A 2015 car probably isn t going to impress anyone and you will know that it is missing some of the features in more recent models. It would be easier to get management to sign off on replacing old computers if they had 2015 on the front, trying to estimate hidden costs of support and lost productivity of using old computers is hard but saying it s a 2015 model and way out of date is easy. There is a history of using dates as software versions. The Reference Policy for SE Linux [2] (which is used for Debian) has releases based on date. During the Debian development process I upload policy to Debian based on the upstream Git and use the same version numbering scheme which is more convenient than the append git date to last full release system that some maintainers are forced to use. The users can get an idea of how much the Debian/Unstable policy has diverged from the last full release by looking at the dates. Also an observer might see the short difference between release dates of SE Linux policy and Debian release freeze dates as an indication that I beg the upstream maintainers to make a new release just before each Debian freeze which is expactly what I do. When I took over the Portslave [3] program I made releases based on date because there were several forks with different version numbering schemes so my options were to just choose a higher number (which is OK initially but doesn t scale if there are more forks) or use a date and have people know that the recent date is the most recent version. The fact that the latest release of Portslave is version 2010.04.19 shows that I have not been maintaining it in recent years (due to lack of hardware as well as lack of interest), so if someone wants to take over the project I would be happy to help them do so! I don t expect people at Dell and other hardware vendors to take much notice of my ideas (I have tweeted them photographic evidence of a problem with no good response). But hopefully this will start a discussion in the free software community.

16 February 2024

Mike Gabriel: Debian Edu 12 - Call for Testing

This is a call for testing of Debian Edu based on Debian bookworm. With the Debian 12.5 point release all required packages have landed in the Debian Edu ISO images that allow you to install a Debian Edu system based on Debian 12. ISO Image Downloads You can find the Blueray Disc ISO image (use for main server installation) at: http://cdimage.debian.org/cdimage/release/current/amd64/iso-bd/debian-ed... For standalone workstation installations or installations on an already up-and-running Debian Edu site, please use the netinst ISO image: http://cdimage.debian.org/cdimage/release/current/amd64/iso-cd/debian-ed... Quick Start HowTo For testing Debian Edu 12, set up e.g. LXD or libVirt and install (at least) three virtual machines. In your virtualization software prepare an internal network where the VMs can reach one another without needing access to your local network. The three VMs: Happy testing! Further Readings Overall installation profile concept of Debian Edu:
https://wiki.debian.org/DebianEdu/BeforeGettingStarted Debian Edu 12 manual:
https://jenkins.debian.net/userContent/debian-edu-doc/ Debian Edu 12 status page:
https://wiki.debian.org/DebianEdu/Status/Bookworm

12 February 2024

Emanuele Rocca: Enabling Kernel Settings in Debian

This time it s about enabling new kernel config options in the official Debian kernel packages. A few dependencies are needed to run the various scripts used by the Debian kernel folks, as well as to build the kernel itself:
apt install git gpg python3-debian python3-dacite
apt build-dep linux
With that in place, fetch the linux and kernel-team repos:
git clone --depth 1 https://salsa.debian.org/kernel-team/linux
git clone --depth 1 https://salsa.debian.org/kernel-team/kernel-team
So far you ve only got the Debian-specific bits. Fetch the actual kernel sources now. In the likely case that you re building a stable kernel, run the following from within the linux directory:
debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Use the torvalds repo if you re building an RC version instead:
debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Now generate the upstream tarball as well as debian/control. The first command will take a bit, and the second command will fail: but that s success just as the output says.
debian/rules orig
debian/rules debian/control
Now generate patched sources with:
debian/rules source
Time to edit the Kconfig and enable/disable whatever setting you wanted to change. Take a look around the files under debian/config/ to see where your changes should go. If it s a setting shared among multiple architectures that may be debian/config/config. For x86-specific things, the file is debian/config/amd64/config. On aarch64 debian/config/arm64/config. If in doubt, you could try asking #debian-kernel on IRC.
It may look like you need to figure out where exactly in the file the setting should be placed. That is not the case. There s a helpful script fixing things up for you:
../kernel-team/utils/kconfigeditor2/process.py .
The above will fail if you forgot to run debian/rules source. The debian/build/source_rt/Kconfig file is needed by the script:
Traceback (most recent call last):
  File "/tmp/linux/../kernel-team/utils/kconfigeditor2/process.py", line 19, in __init__
    menu = fs_menu[featureset or 'none']
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'rt'
During handling of the above exception, another exception occurred:
[...]
FileNotFoundError: [Errno 2] No such file or directory: './debian/build/source_rt/Kconfig'
If that happens, run:
debian/rules source
Now process.py should work fine and fix your config file.
Excellent, now the config is updated and we re ready to build the kernel. Off we go:
export MAKEFLAGS=-j$(nproc)
export DEB_BUILD_PROFILES='pkg.linux.nokerneldbg pkg.linux.nokerneldbginfo pkg.linux.notools nodoc'
dpkg-buildpackage -b -nc -uc

11 February 2024

Freexian Collaborators: Debian Contributions: Upcoming Improvements to Salsa CI, /usr-move, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Upcoming Improvements to Salsa CI, by Santiago Ruano Rinc n Santiago started picking up the work made by Outreachy Intern, Enock Kashada (a big thanks to him!), to solve some long-standing issues in Salsa CI. Currently, the first job in a Salsa CI pipeline is the extract-source job, used to produce a debianize source tree of the project. This job was introduced to make it possible to build the projects on different architectures, on the subsequent build jobs. However, that extract-source approach is sub-optimal: not only it increases the execution time of the pipeline by some minutes, but also projects whose source tree is too large are not able to use the pipeline. The debianize source tree is passed as an artifact to the build jobs, and for those large projects, the size of their source tree exceeds the Salsa s limits. This is specific issue is documented as issue #195, and the proposed solution is to get rid of the extract-source job, relying on sbuild in the very build job (see issue #296). Switching to sbuild would also help to improve the build source job, solving issues such as #187 and #298. The current work-in-progress is very preliminary, but it has already been possible to run the build (amd64), build-i386 and build-source job using sbuild with the unshare mode. The image on the right shows a pipeline that builds grep. All the test jobs use the artifacts of the new build job. There is a lot of remaining work, mainly making the integration with ccache work. This change could break some things, it will also be important to test how the new pipeline works with complex projects. Also, thanks to Emmanuel Arias, we are proposing a Google Summer of Code 2024 project to improve Salsa CI. As part of the ongoing work in preparation for the GSoC 2024 project, Santiago has proposed a merge request to make more efficient how contributors can test their changes on the Salsa CI pipeline.

/usr-move, by Helmut Grohne In January, we sent most of the moving patches for the set of packages involved with debootstrap. Notably missing is glibc, which turns out harder than anticipated via dumat, because it has Conflicts between different architectures, which dumat does not analyze. Patches for diversion mitigations have been updated in a way to not exhibit any loss anymore. The main change here is that packages which are being diverted now support the diverting packages in transitioning their diversions. We also supported a few packages with non-trivial changes such as netplan.io. dumat has been enhanced to better support derivatives such as Ubuntu.

Miscellaneous contributions
  • Python 3.12 migration trundles on. Stefano Rivera helped port several new packages to support 3.12.
  • Stefano updated the Sphinx configuration of DebConf Video Team s documentation, which was broken by Sphinx 7.
  • Stefano published the videos from the Cambridge MiniDebConf to YouTube and PeerTube.
  • DebConf 24 planning has begun, and Stefano & Utkarsh have started work on this.
  • Utkarsh re-sponsored the upload of golang-github-prometheus-community-pgbouncer-exporter for Lena.
  • Colin Watson added Incus support to autopkgtest.
  • Colin discovered Perl::Critic and used it to tidy up some poor practices in several of his packages, including debconf.
  • Colin did some overdue debconf maintenance, mainly around tidying up error message handling in several places (1, 2, 3).
  • Colin figured out how to update the mirror size documentation in debmirror, last updated in 2010. It should now be much easier to keep it up to date regularly.
  • Colin issued a man-db buster update to clean up some irritations due to strict sandboxing.
  • Thorsten Alteholz adopted two more packages, magicfilter and ifhp, for the debian-printing team. Those packages are the last ones of the latest round of adoptions to preserve the old printing protocol within Debian. If you know of other packages that should be retained, please don t hesitate to contact Thorsten.
  • Enrico participated in /usr-merge discussions with Helmut.
  • Helmut sent patches for 16 cross build failures.
  • Helmut supported Matthias Klose (not affiliated with Freexian) with adding -for-host support to gcc-defaults.
  • Helmut uploaded dput-ng enabling dcut migrate and merging two MRs of Ben Hutchings.
  • Santiago took part in the discussions relating to the EU Cyber Resilience Act (CRA) and the Debian public statement that was published last year. He participated in a meeting with Members of the European Parliament (MEPs), Marcel Kolaja and Karen Melchior, and their teams to clarify some points about the impact of the CRA and Debian and downstream projects, and the improvements in the last version of the proposed regulation.

9 February 2024

Abhijith PA: A new(kind of) phone

I was using a refurbished Xiaomi Redmi 6 Pro (codename: sakura . I do remember buying this phone to run LineageOS as I found this model had support. But by the time I bought it (there was a quite a gap between my research and the actual purchase) lineageOS ended the suport for this device. The maintainer might ve ended this port, I don t blame them. Its a non rewarding work. Later I found there is a DotOS custom rom available. I did a week of test run. There seems lot of bugs and it all was minor and I can live with that. Since there is no other official port for redmi 6 pro at the time, I settled on this buggy DotOS nightly build. The phone was doing fine all these(3+) years, but recently the camera started showing greyish dots on some areas to a point that I can t even properly take a photo. The outer body of the phone wasn t original, as it was a refurbished, thus the colors started to peel off and wasn t aesthetically pleasing. Then there is the Location detection issue which took a while to figure out location and battery drains fast. I recently started mapping public transportation routes, and the GPS issue was a problem for me. With all these problems combined, I decided to buy a new phone. But choosing a phone wasn t easy. There are far too many options in the market. However, one thing I was quite sure of was that I wouldn t be buying a brand new phone, but a refurbished one. It is my protest against all these phone manufacturers who have convinced the general public that a mobile phone s lifespan is only 2 years. Of course, there are a few exceptions, but the majority of players in the Indian market are not. Now I think about it, I haven t bought any brand new computing device, be it phones or laptops since 2015. All are refurbished devices except for an Orange pi. My Samsung R439 laptop which I already blogged about is still going strong. Back to picking phone. So I began by comparing LineageOS website to an online refurbished store to pick a phone that has an official lineage support then to reddit search for any reported hardware failures Google Pixel phones are well-known in the custom ROM community for ease of installation, good hardware and Android security releases. My above claims are from privacyguides.org and XDA-developers. I was convinced to go with this choice. But I have some one at home who is at the age of throwing things. So I didn t want to risk buying an expensive Pixel phone only to have it end up with a broken screen and edges. Perhaps I will consider buying a Pixel next time. After doing some more research and cross comparison I landed up on Redmi note 9. Codename: merlinx. Based on my previous experience I knew it is going to be a pain to unlock the bootloader of Xiaomi phones and I was prepared for that but this time there was an extra hoop. The phone came with ROM MIUI version 13. A small search on XDA forums and reddit told unless we downgrade to 12 or so it is difficult to unlock bootloader for this device. And performing this wasn t exactly easy as we need tools like SP Flash etc. It took some time, but I ve completed the downgrade process. From there, the rest was cakewalk as everything was perfectly documented in LineageOS wiki. And ta-da, I have a phone running lineageOS. I been using it for some time and honestly I haven t come across any bug in my way of usage. One advice I will give to you if you are going to flash a custom ROM. There are plenty of videos about unlocking bootloader, installing ROMs in Youtube. Please REFRAIN from watching and experimenting it on your phone unless you figured a way to un brick your device first.

6 February 2024

Robert McQueen: Flathub: Pros and Cons of Direct Uploads

I attended FOSDEM last weekend and had the pleasure to participate in the Flathub / Flatpak BOF on Saturday. A lot of the session was used up by an extensive discussion about the merits (or not) of allowing direct uploads versus building everything centrally on Flathub s infrastructure, and related concerns such as automated security/dependency scanning. My original motivation behind the idea was essentially two things. The first was to offer a simpler way forward for applications that use language-specific build tools that resolve and retrieve their own dependencies from the internet. Flathub doesn t allow network access during builds, and so a lot of manual work and additional tooling is currently needed (see Python and Electron Flatpak guides). And the second was to offer a maybe more familiar flow to developers from other platforms who would just build something and then run another command to upload it to the store, without having to learn the syntax of a new build tool. There were many valid concerns raised in the room, and I think on reflection that this is still worth doing, but might not be as valuable a way forward for Flathub as I had initially hoped. Of course, for a proprietary application where Flathub never sees the source or where it s built, whether that binary is uploaded to us or downloaded by us doesn t change much. But for an FLOSS application, a direct upload driven by the developer causes a regression on a number of fronts. We re not getting too hung up on the malicious developer inserts evil code in the binary case because Flathub already works on the model of verifying the developer and the user makes a decision to trust that app we don t review the source after all. But we do lose other things such as our infrastructure building on multiple architectures, and visibility on whether the build environment or upload credentials have been compromised unbeknownst to the developer. There is now a manual review process for when apps change their metadata such as name, icon, license and permissions which would apply to any direct uploads as well. It was suggested that if only heavily sandboxed apps (eg no direct filesystem access without proper use of portals) were permitted to make direct uploads, the impact of such concerns might be somewhat mitigated by the sandboxing. However, it was also pointed out that my go-to example of Electron app developers can upload to Flathub with one command was also a bit of a fiction. At present, none of them would pass that stricter sandboxing requirement. Almost all Electron apps run old versions of Chromium with less complete portal support, needing sandbox escapes to function correctly, and Electron (and Chromium s) sandboxing still needs additional tooling/downstream patching to run inside a Flatpak. Buh-boh. I think for established projects who already ship their own binaries from their own centralised/trusted infrastructure, and for developers who have understandable sensitivities about binary integrity such such as encryption, password or financial tools, it s a definite improvement that we re able to set up direct uploads with such projects with less manual work. There are already quite a few applications including verified ones where the build recipe simply fetches a binary built elsewhere and unpacks it, and if this already done centrally by the developer, repeating the exercise on Flathub s server adds little value. However for the individual developer experience, I think we need to zoom out a bit and think about how to improve this from a tools and infrastructure perspective as we grow Flathub, and as we seek to raise funds for different sources for these improvements. I took notes for everything that was mentioned as a tooling limitation during the BOF, along with a few ideas about how we could improve things, and hope to share these soon as part of an RFP/RFI (Request For Proposals/Request for Information) process. We don t have funding yet but if we have some prospective collaborators to help refine the scope and estimate the cost/effort, we can use this to go and pursue funding opportunities.

2 February 2024

Scarlett Gately Moore: Some exciting news! Kubuntu: I m back!!!

It s official, the Kubuntu Council has hired me part time to work on the 24.04 LTS release, preparation for Plasma 6, and to bring life back into the Distribution. First I want thank the Kubuntu Council for this opportunity and I plan a long and successful journey together!!!! My first week ( I started midweek ): It has been a busy one! Many meet and greets with the team and other interested parties. I had the chance to chat with Mike from Kubuntu Focus and I have to say I am absolutely amazed with the work they have done, and if you are in the market for a new laptop, you must check these out!!! https://kfocus.org Or if you want to try before you buy you can download the OS! All they ask is for an e-mail, which is completely reasonable. Hosting isn t free! Besides, you can opt out anytime and they don t share it with anyone. I look forward to working closely with this project. We now have a Kubuntu Team in KDE invent https://invent.kde.org/teams/distribution-kubuntu if you would like to join us, please don t hesitate to ask! I have started a new Wiki and our first page is the ever important Bug triaging! It is still a WIP but you can check it out here: https://invent.kde.org/teams/distribution-kubuntu/docs/-/wikis/Bug-Triage-Story-WIP , with that said I have started the launchpad work to make tracking our bugs easier buy subscribing kubuntu-bugs to all our packages and creating proper projects for our packages missing them. We have compiled a list of our various documentation links that need updated and Rick Timmis is updating kubuntu.org! Aaron Honeycutt has been busy with the Kubuntu Manual https://github.com/kubuntu-team/kubuntu-manual which is in good shape. We just need to improve our developer story  I have been working on the rather massive Apparmor bug https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844 with testing the fixes from the ppa and writing profiles for the various KDE packages affected ( pretty much anything that uses webengine ) and making progress there. My next order of business staging Frameworks 5.114 with guidance from our super awesome Rik Mills that has been doing most of the heavy lifting in Kubuntu for many years now. So thank you for that Rik  I will also start on our big transition to the Calamaras Installer! I do have experience here, so I expect it will be a smooth one. I am so excited for the future of Kubuntu and the exciting things to come! With that said, the Kubuntu funding is community donation driven. There is enough to pay me part time for a couple contracts, but it will run out and a full-time contract would be super awesome. I am reaching out to anyone enjoying Kubuntu and want to help with the future of Kubuntu to please consider a donation! We are working on more donation options, but for now you can donate through paypal at https://kubuntu.org/donate/ Thank you!!!!!

Ben Hutchings: FOSS activity in December 2023

30 January 2024

Matthew Palmer: Why Certificate Lifecycle Automation Matters

If you ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the Show More button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense after all, if a CA is issuing a large volume of certificates, they ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer. This gave me a list of the issuers of these certificates, which looks a bit like this:
/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020
Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data
Lieutenant Commander Data from Star Trek: The Next Generation Insert obligatory "not THAT data" comment here
The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:
IssuerCompromised Count
Sectigo170
ISRG (Let's Encrypt)161
GoDaddy141
DigiCert81
GlobalSign46
Entrust3
SSL.com1
If you re familiar with the CA ecosystem, you ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then. Let s look more closely at the relationships, though, to see if we can get more useful insights.

Volume Control Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a compromise rate . I m using the Unexpired Precertificates colume from the issuance volume report, as I feel that s the number that best matches the certificate population I m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.
IssuerIssuance VolumeCompromised CountCompromise Rate
Sectigo88,323,0681701 in 519,547
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
GoDaddy56,121,4291411 in 398,024
DigiCert144,713,475811 in 1,786,586
GlobalSign1,438,485461 in 31,271
Entrust23,16631 in 7,722
SSL.com171,81611 in 171,816
If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
Sectigo88,323,0681701 in 519,547
DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
By grouping by order-of-magnitude in the compromise rate, we can identify three bands :
  • The Super Leakers: Customers of Entrust and GlobalSign seem to love to lose control of their private keys. For Entrust, at least, though, the small volumes involved make the numbers somewhat untrustworthy. The three compromised certificates could very well belong to just one customer, for instance. I m not aware of anything that GlobalSign does that would make them such an outlier, either, so I m inclined to think they just got unlucky with one or two customers, but as CAs don t include customer IDs in the certificates they issue, it s not possible to say whether that s the actual cause or not.
  • The Regular Leakers: Customers of SSL.com, GoDaddy, and Sectigo all have compromise rates in the 1-in-hundreds-of-thousands range. Again, the low volumes of SSL.com make the numbers somewhat unreliable, but the other two organisations in this group have large enough numbers that we can rely on that data fairly well, I think.
  • The Low Leakers: Customers of DigiCert and Let s Encrypt are at least three times less likely than customers of the regular leakers to lose control of their private keys. Good for them!
Now we have some useful insights we can think about.

Why Is It So?
Professor Julius Sumner Miller If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out
All of the organisations on the list, with the exception of Let s Encrypt, are what one might term traditional CAs. To a first approximation, it s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key somewhere. Since humans are handling the keys, there s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world. Let s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let s Encrypt has 161 compromised certificates currently in the wild, it s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn t eliminate it completely.

Explaining the Outlier The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let s Encrypt and the other organisations, if it weren t for one outlier. This is a largely traditional CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let s Encrypt. We are, of course, talking about DigiCert. The thing about DigiCert, that doesn t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest hosted TLS providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys. When we ask for all certificates issued by DigiCert , we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent. It s possible, though not trivial, to account for certificates issued to these hosted TLS providers, because the certificates they use are issued from intermediates branded to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:
SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';
The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
"Regular" DigiCert40,397,363811 in 498,732
Sectigo88,323,0681701 in 519,547
All DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
In short, it appears that DigiCert s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean? The takeaway from all this is fairly straightforward, and not overly surprising, I believe.

The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem
Bender, the robot from Futurama, asking if we'd like to kill all humans No thanks, Bender, I'm busy tonight
This observation that if you don t let humans near keys, they don t get leaked is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between most and basically all of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS the keys are locked away where humans can t get to them. It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it. More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated If you d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations In the interests of clarity, I feel it s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn t feasibly account for.
  • Time Periods: Because time never stops, there is likely to be some slight mismatches in the numbers obtained from the various data sources, because they weren t collected at exactly the same moment.
  • Issuer-to-Organisation Mapping: It s possible that the way I mapped issuers to organisations doesn t match exactly with how crt.sh does it, meaning that counts might be skewed. I tried to minimise that by using the same data sources (the CCADB AllCertificates report) that I believe that crt.sh uses for its mapping, but I cannot be certain of a perfect match.
  • Unwarranted Grouping: I ve drawn some conclusions about the practices of the various organisations based on their general approach to certificate issuance. If a particular subordinate CA that I ve grouped into the parent organisation is managed in some unusual way, that might cause my conclusions to be erroneous. I was able to fairly easily separate out CloudFlare, AWS, and Azure, but there are almost certainly others that I didn t spot, because hoo boy there are a lot of intermediate CAs out there.

28 January 2024

Niels Thykier: Annotating the Debian packaging directory

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about. If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity. Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going. I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.
Adding (non-static) dpkg and debhelper files to the mix Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission... In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling. Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them. With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service. With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh... Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable. At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands. Nice, surely, this would be a good place to stop, right...?
Adding more metadata to the files The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more. Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together. Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files. So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on. I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:
  • Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.
  • Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.
  • Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...
    • If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.
So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well. Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant. Ok, this seems like a closed deal, right...?
Context, context, context However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package. This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later. Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/ name , debputy presented me with the correct install-path with the package name placing the name placeholder. Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git) Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/ package . Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context) It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates. Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.
Wrapping it up I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner. However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools. To try it out, you can try the following demo: In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)
Limitations of the approach As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:
  • When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.
  • When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.
You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

25 January 2024

Joachim Breitner: GHC Steering Committee Retrospective

After seven years of service as member and secretary on the GHC Steering Committee, I have resigned from that role. So this is a good time to look back and retrace the formation of the GHC proposal process and committee. In my memory, I helped define and shape the proposal process, optimizing it for effectiveness and throughput, but memory can be misleading, and judging from the paper trail in my email archives, this was indeed mostly Ben Gamari s and Richard Eisenberg s achievement: Already in Summer of 2016, Ben Gamari set up the ghc-proposals Github repository with a sketch of a process and sent out a call for nominations on the GHC user s mailing list, which I replied to. The Simons picked the first set of members, and in the fall of 2016 we discussed the committee s by-laws and procedures. As so often, Richard was an influential shaping force here.

Three ingredients For example, it was him that suggested that for each proposal we have one committee member be the Shepherd , overseeing the discussion. I believe this was one ingredient for the process effectiveness: There is always one person in charge, and thus we avoid the delays incurred when any one of a non-singleton set of volunteers have to do the next step (and everyone hopes someone else does it). The next ingredient was that we do not usually require a vote among all members (again, not easy with volunteers with limited bandwidth and occasional phases of absence). Instead, the shepherd makes a recommendation (accept/reject), and if the other committee members do not complain, this silence is taken as consent, and we come to a decision. It seems this idea can also be traced back on Richard, who suggested that once a decision is requested, the shepherd [generates] consensus. If consensus is elusive, then we vote. At the end of the year we agreed and wrote down these rules, created the mailing list for our internal, but publicly archived committee discussions, and began accepting proposals, starting with Adam Gundry s OverloadedRecordFields. At that point, there was no secretary role yet, so how I did become one? It seems that in February 2017 I started to clean-up and refine the process documentation, fixing bugs in the process (like requiring authors to set Github labels when they don t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the email in which I described that role I wrote:
Simon already shoved me towards picking up the secretary hat, to reduce load on Ben.
So when I merged the updated process documentation, I already listed myself secretary . It wasn t just Simon s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:
I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.
So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here. I am rather content with the result: These three ingredients single secretary, per-proposal shepherds, silence-is-consent helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.

Ulterior motivation I must admit, however, there was an ulterior motivation behind me grabbing the secretary role: Yes, I did want the committee to succeed, and I did want that authors receive timely, good and decisive feedback on their proposals but I did not really want to have to do that part. I am, in fact, a lousy proposal reviewer. I am too generous when reading proposals, and more likely mentally fill gaps in a specification rather than spotting them. Always optimistically assuming that the authors surely know what they are doing, rather than critically assessing the impact, the implementation cost and the interaction with other language features. And, maybe more importantly: why should I know which changes are good and which are not so good in the long run? Clearly, the authors cared enough about a proposal to put it forward, so there is some need and I do believe that Haskell should stay an evolving and innovating language but how does this help me decide about this or that particular feature. I even, during the formation of the committee, explicitly asked that we write down some guidance on Vision and Guideline ; do we want to foster change or innovation, or be selective gatekeepers? Should we accept features that are proven to be useful, or should we accept features so that they can prove to be useful? This discussion, however, did not lead to a concrete result, and the assessment of proposals relied on the sum of each member s personal preference, expertise and gut feeling. I am not saying that this was a mistake: It is hard to come up with a general guideline here, and even harder to find one that does justice to each individual proposal. So the secret motivation for me to grab the secretary post was that I could contribute without having to judge proposals. Being secretary allowed me to assign most proposals to others to shepherd, and only once in a while myself took care of a proposal, when it seemed to be very straight-forward. Sneaky, ain t it?

7 Years later For years to come I happily played secretary: When an author finished their proposal and public discussion ebbed down they would ping me on GitHub, I would pick a suitable shepherd among the committee and ask them to judge the proposal. Eventually, the committee would come to a conclusion, usually by implicit consent, sometimes by voting, and I d merge the pull request and update the metadata thereon. Every few months I d summarize the current state of affairs to the committee (what happened since the last update, which proposals are currently on our plate), and once per year gathered the data for Simon Peyton Jones annually GHC Status Report. Sometimes some members needed a nudge or two to act. Some would eventually step down, and I d sent around a call for nominations and when the nominations came in, distributed them off-list among the committee and tallied the votes. Initially, that was exciting. For a long while it was a pleasant and rewarding routine. Eventually, it became a mere chore. I noticed that I didn t quite care so much anymore about some of the discussion, and there was a decent amount of naval-gazing, meta-discussions and some wrangling about claims of authority that was probably useful and necessary, but wasn t particularly fun. I also began to notice weaknesses in the processes that I helped shape: We could really use some more automation for showing proposal statuses, notifying people when they have to act, and nudging them when they don t. The whole silence-is-assent approach is good for throughput, but not necessary great for quality, and maybe the committee members need to be pushed more firmly to engage with each proposal. Like GHC itself, the committee processes deserve continuous refinement and refactoring, and since I could not muster the motivation to change my now well-trod secretarial ways, it was time for me to step down. Luckily, Adam Gundry volunteered to take over, and that makes me feel much less bad for quitting. Thanks for that! And although I am for my day job now enjoying a language that has many of the things out of the box that for Haskell are still only language extensions or even just future proposals (dependent types, BlockArguments, do notation with ( foo) expressions and Unicode), I m still around, hosting the Haskell Interlude Podcast, writing on this blog and hanging out at ZuriHac etc.

22 January 2024

Russell Coker: Storage Trends 2024

It has been less than a year since my last post about storage trends [1] and enough has changed to make it worth writing again. My previous analysis was that for <2TB only SSD made sense, for 4TB SSD made sense for business use while hard drives were still a good option for home use, and for 8TB+ hard drives were clearly the best choice for most uses. I will start by looking at MSY prices, they aren't the cheapest (you can get cheaper online) but they are competitive and they make it easy to compare the different options. I'll also compare the cheapest options in each size, there are more expensive options but usually if you want to pay more then the performance benefits of SSD (both SATA and NVMe) are even more appealing. All prices are in Australian dollars and of parts that are readily available in Australia, but the relative prices of the parts are probably similar in most countries. The main issue here is when to use SSD and when to use hard disks, and then if SSD is chosen which variety to use. Small Storage For my last post the cheapest storage devices from MSY were $19 for a 128G SSD, now it s $24 for a 128G SSD or NVMe device. I don t think the Australian dollar has dropped much against foreign currencies, so I guess this is partly companies wanting more profits and partly due to the demand for more storage. Items that can t sell in quantity need higher profit margins if they are to have them in stock. 500G SSDs are around $33 and 500G NVMe devices for $36 so for most use cases it wouldn t make sense to buy anything smaller than 500G. The cheapest hard drive is $45 for a 1TB disk. A 1TB SATA SSD costs $61 and a 1TB NVMe costs $79. So 1TB disks aren t a good option for any use case. A 2TB hard drive is $89. A 2TB SATA SSD is $118 and a 2TB NVMe is $145. I don t think the small savings you can get from using hard drives makes them worth using for 2TB. For most people if you have a system that s important to you then $145 on storage isn t a lot to spend. It seems hardly worth buying less than 2TB of storage, even for a laptop. Even if you don t use all the space larger storage devices tend to support more writes before wearing out so you still gain from it. A 2TB NVMe device you buy for a laptop now could be used in every replacement laptop for the next 10 years. I only have 512G of storage in my laptop because I have a collection of SSD/NVMe devices that have been replaced in larger systems, so the 512G is essentially free for my laptop as I bought a larger device for a server. For small business use it doesn t make sense to buy anything smaller than 2TB for any system other than a router. If you buy smaller devices then you will sometimes have to pay people to install bigger ones and when the price is $145 it s best to just pay that up front and be done with it. Medium Storage A 4TB hard drive is $135. A 4TB SATA SSD is $319 and a 4TB NVMe is $299. The prices haven t changed a lot since last year, but a small increase in hard drive prices and a small decrease in SSD prices makes SSD more appealing for this market segment. A common size range for home servers and small business servers is 4TB or 8TB of storage. To do that on SSD means about $600 for 4TB of RAID-1 or $900 for 8TB of RAID-5/RAID-Z. That s quite affordable for that use. For 8TB of less important storage a 8TB hard drive costs $239 and a 8TB SATA SSD costs $899 so a hard drive clearly wins for the specific case of non-RAID single device storage. Note that the U.2 devices are more competitive for 8TB than SATA but I included them in the next section because they are more difficult to install. Serious Storage With 8TB being an uncommon and expensive option for consumer SSDs the cheapest price is for multiple 4TB devices. To have multiple NVMe devices in one PCIe slot you need PCIe bifurcation (treating the PCIe slot as multiple slots). Most of the machines I use don t support bifurcation and most affordable systems with ECC RAM don t have it. For cheap NVMe type storage there are U.2 devices (the enterprise form of NVMe). Until recently they were too expensive to use for desktop systems but now there are PCIe cards for internal U.2 devices, $14 for a card that takes a single U.2 is a common price on AliExpress and prices below $600 for a 7.68TB U.2 device are common that s cheaper on a per-TB basis than SATA SSD and NVMe! There are PCIe cards that take up to 4*U.2 devices (which probably require bifurcation) which means you could have 8+ U.2 devices in one not particularly high end PC for 56TB of RAID-Z NVMe storage. Admittedly $4200 for 56TB is moderately expensive, but it s in the price range for a small business server or a high end home server. A more common configuration might be 2*7.68TB U.2 on a single PCIe card (or 2 cards if you don t have bifurcation) for 7.68TB of RAID-1 storage. For SATA SSD AliExpress has a 6*2.5 hot-swap device that fits in a 5.25 bay for $63, so if you have 2*5.25 bays you could have 12*4TB SSDs for 44TB of RAID-Z storage. That wouldn t be much cheaper than 8*7.68TB U.2 devices and would be slower and have less space. But it would be a good option if PCIe bifurcation isn t possible. 16TB SATA hard drives cost $559 which is almost exactly half the price per TB of U.2 storage. That doesn t seem like a good deal. If you want 16TB of RAID storage then 3*7.68TB U.2 devices only costs about 50% more than 2*16TB SATA disks. In most cases paying 50% more to get NVMe instead of hard disks is a good option. As sizes go above 16TB prices go up in a more than linear manner, I guess they don t sell much volume of larger drives. 15.36TB U.2 devices are on sale for about $1300, slightly more than twice the price of a 16TB disk. It s within the price range of small businesses and serious home users. Also it should be noted that the U.2 devices are designed for enterprise levels of reliability and the hard disk prices I m comparing to are the cheapest available. If NAS hard disks were compared then the price benefit of hard disks would be smaller. Probably the biggest problem with U.2 for most people is that it s an uncommon technology that few people have much experience with or spare parts for testing. Also you can t buy U.2 gear at your local computer store which might mean that you want to have spare parts on hand which is an extra expense. For enterprise use I ve recently been involved in discussions with a vendor that sells multiple petabyte arrays of NVMe. Apparently NVMe is cheap enough that there s no need to use anything else if you want a well performing file server. Do Hard Disks Make Sense? There are specific cases like comparing a 8TB hard disk to a 8TB SATA SSD or a 16TB hard disk to a 15.36TB U.2 device where hard disks have an apparent advantage. But when comparing RAID storage and counting the performance benefits of SSD the savings of using hard disks don t seem to be that great. Is now the time that hard disks are going to die in the market? If they can t get volume sales then prices will go up due to lack of economy of scale in manufacture and increased stock time for retailers. 8TB hard drives are now more expensive than they were 9 months ago when I wrote my previous post, has a hard drive price death spiral already started? SSDs are cheaper than hard disks at the smallest sizes, faster (apart from some corner cases with contiguous IO), take less space in a computer, and make less noise. At worst they are a bit over twice the cost per TB. But the most common requirements for storage are small enough and cheap enough that being twice as expensive as hard drives isn t a problem for most people. I predict that hard disks will become less popular in future and offer less of a price advantage. The vendors are talking about 50TB hard disks being available in future but right now you can fit more than 50TB of NVMe or U.2 devices in a volume less than that of a 3.5 hard disk so for storage density SSD can clearly win. Maybe in future hard disks will be used in arrays of 100TB devices for large scale enterprise storage. But for home users and small businesses the current sizes of SSD cover most uses. At the moment it seems that the one case where hard disks can really compare well is for backup devices. For backups you want large storage, good contiguous write speeds, and low prices so you can buy plenty of them. Further Issues The prices I ve compared for SATA SSD and NVMe devices are all based on the cheapest devices available. I think it s a bit of a market for lemons [2] as devices often don t perform as well as expected and the incidence of fake products purporting to be from reputable companies is high on the cheaper sites. So you might as well buy the cheaper devices. An advantage of the U.2 devices is that you know that they will be reliable and perform well. One thing that concerns me about SSDs is the lack of knowledge of their failure cases. Filesystems like ZFS were specifically designed to cope with common failure cases of hard disks and I don t think we have that much knowledge about how SSDs fail. But with 3 copies of metadata BTFS or ZFS should survive unexpected SSD failure modes. I still have some hard drives in my home server, they keep working well enough and the prices on SSDs keep dropping. But if I was buying new storage for such a server now I d get U.2. I wonder if tape will make a comeback for backup. Does anyone know of other good storage options that I missed?

Next.

Previous.